Mining dates from historical documents
نویسندگان
چکیده
The essential quality of information in a digital library is accessibility. Full text search is not enough for some collections, more can be done. Historical collections, for example, contain dates, and it would be useful to historians to be able to search by them. However, these dates occur anywhere within the text of historical documents, and to be searched they must be extracted from the documents and integrated into the collection index. Doing this manually is very expensive, described here is the development of a system to do it automatically. This system was implemented within the Greenstone framework used by the New Zealand Digital Library, and involved the creation of some carefully designed heuristics.
منابع مشابه
Mining Frequently Changing Substructures from Historical Unordered XML Documents
Recently, there is an increasing research efforts in XML data mining. These efforts largely assumed that XML documents are static. However, in many real applications, XML data are evolutionary in nature. In this paper, we focus on mining evolution patterns from historical XML documents. Specifically, we propose a novel approach to discover frequently changing structures (FCS) from a sequence of...
متن کاملWord Segmentation of Handwritten Dates in Historical Documents by Combining Semantic A-Priori-Knowledge with Local Features
The recognition of script in historical documents requires suitable techniques in order to identify single words. Segmentation of lines and words is a challenging task because lines are not straight and words may intersect within and between lines. For correct word segmentation, the conventional analysis of distances between text objects needs to be supplemented by a second component predicting...
متن کاملA Retrieval Language for Historical Documents
This paper focuses on a set of structured document applications that we have denoted databases of historical documents. The information into these documents is closely related to the time in which they are created while being still of great usefulness in the future. The main contribution of this paper is the formulation of a group of operators and predicates that express retrieval conditions ov...
متن کاملInformation Access to Historical Documents from the Early New High German Period
With the new interest in historical documents insight grew that electronic access to these texts causes many specific problems. In the first part of the paper we survey the present role of digital historical documents. After collecting central facts and observations on historical language change we comment on the difficulties that result for retrieval and data mining on historical texts. In the...
متن کاملData Mining and Serial Documents
This paper is concerned with the investigation of the relevance and suitability of the data mining approach to serial documents. Conceptually the paper is divided into three parts. The first part presents the salient features of data mining and its symbiotic relationship to data warehousing. In the second part of the paper, historical serial documents are introduced, and the Ottoman Tax Registe...
متن کامل